the new merb-cache

September 07, 2008 | Author: benburkert | Publisher: hassox

The Merb project has been updated with a radical new caching system.

The Merb project has been updated with a radical new caching system. Up until now, handling caching in your Merb app has been almost identical to caching in Rails, and most other web frameworks for that matter. The best approaches to caching often required code for expiring content and handling cache misses to be spread all throughout the application. Furthermore, the code required for each different caching mechanism had minor differences. This makes it difficult to change from page caching to action caching in your application, or vice versa. Merb needed a new approach to caching that focuses on modularity and performance, and that’s what the new merb-cache aims to be.

Right now, the new merb-cache is on it’s own branch of the main merb-more repository. You can follow along here: merb-cache branch

The new merb-cache

merb-cache was rewritten with a few goals in mind:
  1. make it modulular
  2. define a public API
  3. do the heavy lifting on key generation
  4. 100% thread-safe
  5. work with multiple caching layers through the same API
  6. make it easy to keep a “hot” cache without the overhead
  7. keep it hackable

Stores

First and foremost, cache stores have been separated into two families: fundamental stores and strategy stores. A fundamental store is any store that interacts directly with the persistence layer. The FileStore, for example, is a fundamental store that reads & writes cache entries to the file system. MemcachedStore is also a fundamental store. They have almost identical functionality to the existing caching technique, only they implement a common API defined by AbstractStore.

The strategy store is the new kid on the block. A strategy store wraps one or more fundamental stores, acting as a middle man between caching requests. For example, if you need to save memory on your Memcache server, you could wrap your MemcachedStore with a GzipStore. This would automatically compress the cached data when put into the cache, and decompress it on the way out. You can even wrap strategy caches with other strategy caches. If your key was comprised of sensitive information, like a SSN, you might want to encrypt the key before storage. Wrapping your GzipStore in a SHA1Store would take care of that for you.

Public API

The AbstractStore class defines 9 methods as the API:

  1. writable?(key, parameters = {}, conditions = {})
  2. exists?(key, parameters = {})
  3. read(key, parameters = {})
  4. write(key, data = nil, parameters = {}, conditions = {})
  5. write_all(key, data = nil, parameters = {}, conditions = {})
  6. fetch(key, parameters = {}, conditions = {}, &blk)
  7. delete(key, parameters = {})
  8. delete_all
  9. delete_all!

AbstractStrategyStore implements all of these with the exception of delete_all. If a strategy store can guarantee that calling delete_all on it’s wrapped store(s) will only delete entries populated by the strategy store, it may define the safe version of delete_all. However, this is usually not the case, hence delete_all is not part of the public API for AbstractStrategyStore.

A more detailed documentation on each method can be found here: AbstractStrategyStore

Less Talk, More Code

So here’s how you can setup and use merb-cache in your merb app:

config/environments/development.rb


  # create a fundamental memcache store named :memcached for localhost
  Merb::Cache.setup(:memcached, Merb::Cache::MemcachedStore, {
    :namespace => "my_app",
    :servers => ["127.0.0.1:11211"]
  }

  # a default FileStore
  Merb::Cache.setup(Merb::Cache::FileStore)

  # another FileStore
  Merb::Cache.setup(:tmp_cache, Merb::Cache::FileStore, :dir => "/tmp")

Now lets use these in a model:

app/models/tag.rb


  class Tag
    #...

    def find(parameters = {})
      # poor man's identity map

      if Merb::Cache[:memcached].exists?("tags", parameters)
        Merb::Cache[:memcached].read("tags", parameters)
      else
        returning(super(parameters)) do |results|
          Merb::Cache[:memcached].write("tags", results, parameters)
        end
      end
    end

    def popularity_rating
      # lets keep the popularity rating cached for 30 seconds
      # merb-cache will create a key from the model's id & the interval parameter

      Merb::Cache[:memcached].fetch(self.id, :interval => Time.now.to_i / 30) do
        self.run_long_popularity_rating_query
      end
    end
  end

Or, if you want to use memcache’s built in expire option:


  # expire a cache entry for "bar" (identified by the key "foo" and
  # parameters {:baz => :bay}) in two hours
  Merb::Cache[:memcached].write("foo", "bar", {:baz => :bay}, :expire_in => 2.hours)

  # this will fail, because FileStore cannot expire cache entries
  Merb::Cache[:default].write("foo", "bar", {:baz => :bay}, :expire_in => 2.hours)

  # writing to the FileStore will fail, but the MemcachedStore will succeed
  Merb::Cache[:default, :memcached].write("foo", "bar", {:baz => :bay}, :expire_in => 2.hours)

  # this will fail
  Merb::Cache[:default, :memcached].write_all("foo", "bar", {:baz => :bay}, :expire_in => 2.hours)

Strategy Stores

Setting up strategy stores is very similar to fundamental stores:

config/environments/development.rb


  # wraps the :memcached store we setup earlier
  Merb::Cache.setup(:zipped, Merb::Cache::GzipStore[:memcached])

  # wrap a strategy store
  Merb::Cache.setup(:sha_and_zip, Merb::Cache::SHA1Store[:zipped])

  # you can even use unnamed fundamental stores
  Merb::Cache.setup(:zipped_images, Merb::Cache::GzipStore[Merb::Cache::FileStore],
                    :dir => Merb.root / "public" / "images")

  # or a combination or strategy & fundamental stores
  module Merb::Cache #makes things a bit shorter

    setup(:secured, SHA1Store[GzipStore[FileStore], FileStore],
          :dir => Merb.root / "private")
  end

You can use these strategy stores exactly like fundamental stores in your app code.

Action & Page Caching

Action & page caching have been implemented in strategy stores. So instead of manually specifying which type of caching you want for each action, you simply ask merb-cache to cache your action, and it will use the fastest cache available.

First, let’s setup our page & action stores:

config/environments/development.rb


  # the order that stores are setup is important
  # faster stores should be setup first

  # page cache to the public dir
  Merb::Cache.setup(:page_store, Merb::Cache::PageStore[FileStore],
                    :dir => Merb.root / "public")

  # action cache to memcache
  Merb::Cache.setup(:action_store, Merb::Cache::ActionStore[:sha_and_zip])

  # sets up the ordering of stores when attempting to read/write cache entries
  Merb::Cache.setup(:default, Merb::Cache::AdhocStore[:page_store, :action_store])

And now in our controller:

  class Tags < Merb::Controller

    # index & show will be page cached to the public dir. The index
    # action has no parameters, and the show parameter's are part of
    # the URL, making them both page-cache'able
    cache :index, :show

    def index
      render
    end

    def show(:slug)
      display Tag.first(:slug => slug)
    end
  end

Our controller now page caches but the index & show action. Furthermore, the show action is cached separately for each slug parameter automatically.


  class Tags < Merb::Controller

    # the term is a route param, while the page & per_page params are part of the query string.
    # If only the term param is supplied, the request can be page cached, but if the page and/or
    # per_page param is part of the query string, the request will action cache.
    cache :catalog

    def catalog(term = 'a', page = 1, per_page = 20)
      @tags = Tag.for_term(term).paginate(page, per_page)

      display @tags
    end
  end

Because the specific type of caching is not specified, the same action can either be page cached or action cached depending on the context of the request.

Keeping a “Hot” Cache

Cache expiration is a constant problem for developers. When should content be expired? Should we “sweep” stale content? How do we balance serving fresh content and maintaining fast response times? These are difficult questions for developers, and are usually answered with ugly code added across our models, views, and controllers. Instead of designing an elaborate caching and expiring system, an alternate approach is to keep a “hot” cache.

So what is a “hot” cache? A hot cache is what you get when you ignore trying to manually expire content, and instead focus on replacing old content with fresh data as soon as it becomes stale. Keeping a hot cache means no difficult expiration logic spread out across your app, and will all but eliminate cache misses.

The problem until now with this approach has been the impact on response times. If the request has to wait on any pages that it has made stale to render the fresh version, it can slow down the response time dramatically. Thankfully, Merb has the run_later method which allows the fresh content to render after the response has been sent to the browser. It’s the best of both worlds. Here’s an example.


  class Tags < Merb::Controller

    cache :index
    eager_cache :create, :index

    def index
      display Tag.all
    end

    def create(slug)
      @tag = Tag.new(slug)

      # redirect them back to the index action
      redirect url(:tags)
    end
  end

The controller will eager_cache the index action whenever the create action is successfully called. If the client were to post a new tag to the create action, they would be redirect back to the index action. Right after the response had been sent to the client, the index action would be rendered with the newly created tag included and replaced in the cache. So when the user requests for the index action gets to the server, the freshest version is already in the cache, and the cache miss is avoided. This works regardless of the way the index action is cached.

That’s All for Now

There should be support for fetching partials and fragments in the near future, along with a boatload of new caching strategies. Hint Look for a caching strategy to handle internal redirects in both Nginx & Apache in the near future, for serving static files that require authentication. A couple strategies to handle the dog pile affect are also in the works.

Comments

On September 07, 2008 at 21:35 jackdempsey says:

Ben,

This is some excellent stuff. I look forward to trying it out tomorrow. Thanks for all your work on this!

Jack

On September 08, 2008 at 06:58 rasputnik says:

You have typo at the top of app/models/tag.rb – ‘Merb::Cache[:memcached].’ should be ‘Merb::Cache[:memcache].’

On September 08, 2008 at 07:00 rasputnik says:

Also, ‘Time.now.to_i / 30) ’ for the interval (in the same file) looks wrong?

On September 08, 2008 at 08:13 benburkert says:

@rasputnik thanks, I just fixed the memcache/d inconsistencies.

Why does ‘Time.now.to_i / 30’ look wrong to you? That should return the same value for 30 seconds, which has the effect of only using the a cache entry for 30 seconds.

On September 09, 2008 at 00:27 tobyo says:

Is interval in seconds? Can’t you just pass 30? Time.new.to_i returns a timestamp, the number of seconds since epoch. Dividing by 30 gives you about 1.2 years worth of seconds.

On September 09, 2008 at 03:30 evolving_jerk says:

Looks promising, thank you. Could you add some more details on “key expansion heavy lifting”? I’d like to know what approach you take.

On September 09, 2008 at 07:05 benburkert says:

@tobyo – Time.new.to_i / 3 will give you the same value for 30 seconds. Depending on the Store you are using, it would generate a key similar to ”#{id}—interval=#{Time.new.to_i / 3}”, causing the key to change every 30 seconds.

@evolving_jerk – the generation of keys is the responsibility of each Store. All stores accept both an key & parameters hash which the actual key string is generated from, and used by the persistence layer. But the key parameter doesn’t have to be a plain string. Infact, in action & page caching, the key parameter is the instance of the controller you want to cache. The PageStrategy uses the request’s URI for the key string, the ActionStrategy uses the controller & action name.

On December 24, 2008 at 18:23 otto says:

What about eager caching in a typical blog situation? You want to eager cache Post#index, and dirty for Post#create, or possible Post#update. But in most situations, the update and create actions are in an Admin controller. How do you tell a controller that it needs to dirty whenever a different controller action is called?

Sign in to add your comment